Skip to content

Conversation

@kentqian
Copy link
Contributor

@kentqian kentqian commented Aug 20, 2025

Motivation

Given a heuristic parsing solution for AddN op when trying to parse tf and support the concat op when having the mix shapes

Technical Details

Change the chain addition to reduce_sum op for parsing AddN op

If there is a mix of static and dynamic shapes, set everything to dynamic, then at the end, contract the shape back to static if possible. It also calculates the common non axis dims to bound the output. (Concat)

Test Plan

Add test cases in ref and tf/parse

  • test/tf/tests/addn_test.cpp
  • test/ref/add.cpp

Test Result

Submission Checklist

@kentqian kentqian requested review from eddieliao and kahmed10 August 20, 2025 19:38
@kentqian kentqian self-assigned this Aug 20, 2025
@kentqian kentqian added enhancement New feature or request bugfix Fixes a bug found in the code. labels Aug 20, 2025
@kentqian kentqian marked this pull request as ready for review August 28, 2025 22:51
@kentqian kentqian requested a review from causten as a code owner August 28, 2025 22:51
@kentqian kentqian changed the title AddN use the heuristic using reduce_sum op for tensorflow Add the heuristic of AddN op using reduce_sum op for parsing pb file (TF) Aug 28, 2025
@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
cd7e8a
Rate old
397919
Diff Compare
torchvision-resnet50 64 3,160.32 3,245.86 -2.64%
torchvision-resnet50_fp16 64 6,595.68 6,951.81 -5.12% 🔴
torchvision-densenet121 32 2,434.48 2,449.22 -0.60%
torchvision-densenet121_fp16 32 4,100.40 4,167.34 -1.61%
torchvision-inceptionv3 32 1,665.16 1,635.29 1.83%
torchvision-inceptionv3_fp16 32 2,582.21 2,759.38 -6.42% 🔴
cadene-inceptionv4 16 794.19 770.72 3.05% 🔆
cadene-resnext64x4 16 802.60 817.99 -1.88%
slim-mobilenet 64 8,210.79 7,456.32 10.12% 🔆
slim-nasnetalarge 64 221.72 210.95 5.11% 🔆
slim-resnet50v2 64 3,297.22 3,341.58 -1.33%
bert-mrpc-onnx 8 1,131.58 1,144.86 -1.16%
bert-mrpc-tf 1 480.89 445.07 8.05% 🔆
pytorch-examples-wlang-gru 1 297.10 299.79 -0.90%
pytorch-examples-wlang-lstm 1 412.36 399.30 3.27% 🔆
torchvision-resnet50_1 1 798.59 761.18 4.92% 🔆
cadene-dpn92_1 1 411.51 384.27 7.09% 🔆
cadene-resnext101_1 1 360.78 391.94 -7.95% 🔴
onnx-taau-downsample 1 396.77 395.58 0.30%
dlrm-criteoterabyte 1 31.90 33.78 -5.55% 🔴
dlrm-criteoterabyte_fp16 1 50.94 51.23 -0.56%
agentmodel 1 8,718.02 9,034.65 -3.50% 🔴
unet_fp16 2 58.73 59.18 -0.77%
resnet50v1_fp16 1 976.71 989.25 -1.27%
resnet50v1_int8 1 970.28 1,022.00 -5.06% 🔴
bert_base_cased_fp16 64 1,109.36 1,106.73 0.24%
bert_large_uncased_fp16 32 343.68 345.26 -0.46%
bert_large_fp16 1 197.32 197.13 0.09%
distilgpt2_fp16 16 2,096.42 2,115.80 -0.92%
yolov5s 1 581.41 576.03 0.93%
tinyllama 1 43.75 43.97 -0.49%
vicuna-fastchat 1 45.06 45.28 -0.48%
whisper-tiny-encoder 1 409.30 417.53 -1.97%
whisper-tiny-decoder 1 411.18 408.53 0.65%
llama2_7b 1 19.11 19.16 -0.26%
qwen1.5-7b 1 23.44 23.51 -0.32%
phi3-3.8b 1 26.53 26.67 -0.53%
mask-rcnn 1 12.08 12.01 0.60%
llama3-8b 1 21.65 21.72 -0.32%
whisper-large-encoder 1 10.16 10.21 -0.48%
whisper-large-decoder 1 96.84 95.77 1.12%
mistral-7b 1 23.62 23.72 -0.41%
FLUX.1-schnell 1 713.58 746.70 -4.44% 🔴
nan nan nan nan nan%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error outputerror: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

2025-08-29 00:03:16.115752: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')


     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

     ✅ llama2_7b: PASSED: MIGraphX meets tolerance

     ✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

     ✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ llama3-8b: PASSED: MIGraphX meets tolerance

     ✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

     ✅ mistral-7b: PASSED: MIGraphX meets tolerance

     ✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

@causten causten requested a review from Copilot September 2, 2025 23:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a heuristic optimization for parsing TensorFlow's AddN operation by switching from chain addition to reduce_sum for cases with many input tensors (>=5). Additionally, it enhances the concat operation to handle mixed static and dynamic shapes by converting everything to dynamic and then contracting back to static when possible.

  • Implements a heuristic in AddN parsing that uses concat+reduce_sum+squeeze for 5+ inputs instead of chained addition
  • Updates concat operation to handle mixed static/dynamic shapes by unifying to dynamic then contracting to static when possible
  • Adds comprehensive test coverage for the new AddN heuristic with 10 input tensors

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tf/parse_addn.cpp Implements heuristic to use reduce_sum approach for AddN with 5+ inputs
src/include/migraphx/op/concat.hpp Enhances concat to handle mixed static/dynamic shapes and compute common dimensions
test/tf/tests/addn_test.cpp Adds test case for AddN with 10 elements using the new heuristic
test/tf/gen_tf_pb.py Adds generator function for the new AddN test case
test/tf/models/addn_with_many_elements_test.pb Binary protobuf model for testing AddN with many elements
test/ref/add.cpp Adds reference test for the reduce_sum approach to AddN

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

if(args.size() == 1)
return args[0];

if(args.size() < 5) // using heuristic when args exceed over 5 elements
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is inconsistent with the condition. The condition checks for 'less than 5' but the comment says 'when args exceed over 5 elements'. The comment should read 'using chain addition when args are less than 5 elements' or similar.

Suggested change
if(args.size() < 5) // using heuristic when args exceed over 5 elements
if(args.size() < 5) // using chain addition when args are less than 5 elements

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +129
if(has_static and has_dynamic)
{
// Dynamic input shapes
for(std::size_t index = 0; index < inputs[0].ndim(); index++)
for(auto& input : inputs)
{
if(index != axis)
{
if(not std::all_of(inputs.begin(), inputs.end(), [&](const shape& s) {
return s.dyn_dims()[index] == inputs[0].dyn_dims()[index];
}))
MIGRAPHX_THROW("CONCAT: all input dimensions should match in axis " +
std::to_string(index));
}
}
std::size_t new_min = 0;
std::size_t new_max = 0;
for(const auto& input : inputs)
{
auto ddim = input.dyn_dims()[axis];
new_min += ddim.min;
new_max += ddim.max;
if(input.dynamic())
continue;
input = input.to_dynamic();
}
}
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the inputs parameter directly can lead to unexpected side effects since it's passed by value but contains references to shapes. This modification affects the original shapes which may be used elsewhere. Consider creating a local copy of inputs or using a different approach to handle mixed shapes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Fixes a bug found in the code. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants